Interaction terms in regression models

MACS 30200
University of Chicago

May 7, 2017

Additive model

\[Y = \beta_0 + \beta_1 X + \beta_2 Z + e_i\]

Additive model

Additive model

\[E(Y) = \beta_0 + \beta_1 X + \beta_2 Z\]

\[\frac{\delta E(Y)}{\delta X} = \beta_1\]

\[\frac{\delta E(Y)}{\delta Z} = \beta_2\]

Multiplicative interaction model

\[Y = \beta_0 + \beta_1 X + \beta_2 Z + \beta_3 XZ + e_i\]

  • Direct effects
  • Constitutive terms
  • Interaction term

Multiplicative interaction model

\[ \begin{split} E(Y) & = \beta_0 + \beta_1 X + \beta_2 Z + \beta_3 XZ \\ & = \beta_0 + \beta_2 Z + (\beta_1 + \beta_3 Z) X \end{split} \]

\[\frac{\delta E(Y)}{\delta X} = \beta_1 + \beta_3 Z\]

\[E(Y) = \beta_0 + \beta_2 Z + \psi_1 X\]

\[ \begin{split} E(Y) & = \beta_0 + \beta_1 X + (\beta_2 + \beta_3 X) Z \\ & = \beta_0 + \beta_2 X + \psi_2 Z \end{split} \]

Multiplicative interaction model

  • Conditional impact
  • If \(Z = 0\), then:

    \[ \begin{split} E(Y) & = \beta_0 + \beta_1 X + \beta_2 (0) + \beta_3 X (0) \\ & = \beta_0 + \beta_1 X \end{split} \]

  • If \(X = 0\), then:

    \[ \begin{split} E(Y) & = \beta_0 + \beta_1 (0) + \beta_2 Z + \beta_3 (0) Z \\ & = \beta_0 + \beta_2 Z \end{split} \]
    • \(\psi_1 = \beta_1\) and \(\psi_2 = \beta_2\)
  • \(+\beta_3\) and \(-\beta_3\)
  • \(\psi_1\) and \(\psi_2\)

Conducting inference

  • Obtaining estimates of parameters

    \[\hat{\psi}_1 = \hat{\beta}_1 + \hat{\beta}_3 Z\] \[\hat{\psi}_2 = \hat{\beta}_2 + \hat{\beta}_3 X\]
  • Obtaining estimates of standard errors

Conducting inference

  1. \(\text{Var}(aX) = a^2 \text{Var}(X)\)
  2. \(\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y) + 2 \text{Cov}(X,Y)\)
  3. \(\text{Cov}(X, aY) = a \text{Cov}(X,Y)\)

Conducting inference

\[\widehat{\text{Var}(\hat{\psi}_1}) = \widehat{\text{Var} (\hat{\beta}_1)} +Z^2 \widehat{\text{Var} (\hat{\beta}_3)} + 2 Z \widehat{\text{Cov} (\hat{\beta}_1, \hat{\beta}_3)}\]

\[\widehat{\text{Var}(\hat{\psi}_2}) = \widehat{\text{Var} (\hat{\beta}_2)} + X^2 \widehat{\text{Var} (\hat{\beta}_3)} + 2 X \widehat{\text{Cov} (\hat{\beta}_2, \hat{\beta}_3)}\]

  • Depend on \(\beta_1\), \(\beta_2\), and/or \(\beta_3\)
  • Both also depend on the level/value of the interacted variable

Two dichtomous covariates

\[Y = \beta_0 + \beta_1 D_1 + \beta_2 D_2 + \beta_3 D_1 D_2 + e_i\]

\[ \begin{split} E(Y | D_1 = 0, D_2 = 0) & = \beta_0 \\ E(Y | D_1 = 1, D_2 = 0) & = \beta_0 + \beta_1 \\ E(Y | D_1 = 0, D_2 = 1) & = \beta_0 + \beta_2 \\ E(Y | D_1 = 1, D_2 = 1) & = \beta_0 + \beta_1 + \beta_2 + \beta_3 \\ \end{split} \]

Two dichtomous covariates

Two dichtomous covariates

One dichotomous and one continuous covariate

\[Y = \beta_0 + \beta_1 X + \beta_2 D + \beta_3 XD + e_i\]

\[ \begin{split} E(Y | X, D = 0) & = \beta_0 + \beta_1 X \\ E(Y | X, D = 1) & = (\beta_0 + \beta_2) + (\beta_1 + \beta_3) X \end{split} \]

Two continuous covariates

\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + e_i\]

Two continuous covariates

Quadratic, cubic, and other polynomial effects

\[Y = \beta_0 + \beta_1 X + \beta_2 X^2 + e\]

\[\frac{\delta E(Y)}{\delta X} = \beta_1 + 2 \beta_2 X\]

Quadratic, cubic, and other polynomial effects

Higher-order interaction terms

\[ \begin{align} Y = \beta_0 &+ \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \beta_4 X_1 X_2 \\ & + \beta_5 X_1 X_3 + \beta_6 X_2 X_3 + \beta_7 X_1 X_2 X_3 + e \end{align} \]

Higher-order interaction terms

\[ \begin{align} Y = \beta_0 &+ \beta_1 X + \beta_2 D_1 + \beta_3 D_2 + \beta_4 X D_1 \\ & + \beta_5 X D_2 + \beta_6 D_1 D_2 + \beta_7 X D_1 D_2 + e \end{align} \]

Higher-order interaction terms

Key rules

  • Don’t omit the “direct effects”
  • Zero should be meaningful
  • Rescaling the variables doesn’t guarantee statistical significance
  • Flexible alternatives
  • Interpreting three(+)-way interactions

Estimating models with multiplicative interactions

  • Obama feeling thermometer (ObamaTherm)
  • RConserv
  • ObamaConserv
  • GOP

Obama data

##    ObamaTherm       RConserv     ObamaConserv       GOP      
##  Min.   :  0.0   Min.   :1.00   Min.   :1.00   Min.   :0.00  
##  1st Qu.: 50.0   1st Qu.:2.00   1st Qu.:2.00   1st Qu.:0.00  
##  Median : 75.0   Median :5.00   Median :2.00   Median :0.00  
##  Mean   : 69.6   Mean   :4.24   Mean   :2.98   Mean   :0.24  
##  3rd Qu.:100.0   3rd Qu.:6.00   3rd Qu.:4.00   3rd Qu.:0.00  
##  Max.   :100.0   Max.   :7.00   Max.   :7.00   Max.   :1.00

Basic linear model

##          term estimate std.error statistic  p.value
## 1 (Intercept)     93.4     1.572      59.4 0.00e+00
## 2    RConserv     -4.1     0.368     -11.2 9.48e-28
## 3         GOP    -26.5     1.587     -16.7 2.82e-57
##   r.squared adj.r.squared sigma statistic   p.value df logLik   AIC   BIC
## 1     0.325         0.324  23.1       336 9.28e-120  3  -6365 12738 12759
##   deviance df.residual
## 1   741815        1394

Dichotomous interaction

\[ \begin{align} \text{Obama thermometer} = \beta_0 &+ \beta_1 (\text{Respondent conservatism}) \\ & + \beta_2 (\text{GOP respondent})\\ & + \beta_3 (\text{Respondent conservatism}) (\text{GOP respondent}) \\ & + e \end{align} \]

Dichotomous interaction

##           term estimate std.error statistic  p.value
## 1  (Intercept)    92.25     1.640     56.24 0.00e+00
## 2     RConserv    -3.81     0.388     -9.81 5.26e-22
## 3          GOP   -11.07     6.684     -1.66 9.79e-02
## 4 RConserv:GOP    -2.86     1.201     -2.38 1.75e-02
##   r.squared adj.r.squared sigma statistic   p.value df logLik   AIC   BIC
## 1     0.328         0.326    23       226 1.15e-119  4  -6362 12735 12761
##   deviance df.residual
## 1   738815        1393

Dichotomous interaction

  • GOP = 0

    \[ \begin{align} E(\text{Obama thermometer}) = 92.255 & -3.805 (\text{Respondent conservatism}) -11.069 (0)\\ & -2.856 (\text{Respondent conservatism} \times 0) \\ = 92.255 & -3.805 (\text{Respondent conservatism}) \end{align} \]

  • GOP = 1

    \[ \begin{align} E(\text{Obama thermometer}) & = (92.255 -11.069 (1)) + (-3.805 -2.856 (\text{Respondent conservatism} \times 1)) \\ & = 81.186 -6.661 (\text{Respondent conservatism}) \end{align} \]

Dichotomous interaction

Separate models

##          term estimate std.error statistic  p.value
## 1 (Intercept)    92.25     1.598      57.7 0.00e+00
## 2    RConserv    -3.81     0.378     -10.1 7.87e-23
##          term estimate std.error statistic  p.value
## 1 (Intercept)    81.19      6.98     11.62 1.90e-26
## 2    RConserv    -6.66      1.22     -5.44 1.04e-07

Causal direction

Calculating standard errors

\[ \begin{align} \text{Obama thermometer} = \beta_0 &+ (\beta_1 + \beta_3 \text{GOP}) (\text{Respondent conservatism}) \\ & + \beta_2 (\text{GOP respondent}) + e \\ = &\beta_0 + \psi_1 (\text{Respondent conservatism}) + \beta_2 (\text{GOP respondent}) + e \end{align} \]

  • Point estimate

    ## [1] -13.9
  • Standard error

    \[\hat{\sigma}_{\hat{\psi}_1} = \sqrt{\widehat{\text{Var}(\hat{\beta}_1)} + (\text{GOP})^2 \widehat{\text{Var}(\hat{\beta_3})} + 2 (\text{GOP}) \widehat{\text{Cov}(\hat{\beta}_1 \hat{\beta}_3)}}\]

    ##              (Intercept) RConserv    GOP RConserv:GOP
    ## (Intercept)        2.691   -0.574 -2.691        0.574
    ## RConserv          -0.574    0.151  0.574       -0.151
    ## GOP               -2.691    0.574 44.677       -7.797
    ## RConserv:GOP       0.574   -0.151 -7.797        1.442
    ## [1] 1.14

Conducting inference

Hypothesis testing

linear.hypothesis(obama_ideo_gop, "RConserv + RConserv:GOP")
## Linear hypothesis test
## 
## Hypothesis:
## RConserv  + RConserv:GOP = 0
## 
## Model 1: restricted model
## Model 2: ObamaTherm ~ RConserv * GOP
## 
##   Res.Df    RSS Df Sum of Sq    F  Pr(>F)    
## 1   1394 757039                              
## 2   1393 738815  1     18225 34.4 5.7e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linear.hypothesis(obama_ideo_gop, "GOP + 7 * RConserv:GOP")
## Linear hypothesis test
## 
## Hypothesis:
## GOP  + 7 RConserv:GOP = 0
## 
## Model 1: restricted model
## Model 2: ObamaTherm ~ RConserv * GOP
## 
##   Res.Df    RSS Df Sum of Sq   F Pr(>F)    
## 1   1394 821850                            
## 2   1393 738815  1     83036 157 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Continuous interaction

\[ \begin{align} \text{Obama thermometer} = \beta_0 &+ \beta_1 (\text{Respondent conservatism}) \\ & + \beta_2 (\text{Obama conservatism})\\ & + \beta_3 (\text{Respondent conservatism}) (\text{Obama conservatism}) \\ & + e \end{align} \]

Continuous interaction

##                    term estimate std.error statistic   p.value
## 1           (Intercept)   117.12     2.972     39.41 8.00e-229
## 2              RConserv   -14.94     0.600    -24.88 2.52e-113
## 3          ObamaConserv    -6.73     0.929     -7.25  7.06e-13
## 4 RConserv:ObamaConserv     2.81     0.182     15.40  1.53e-49
##   r.squared adj.r.squared sigma statistic  p.value df logLik   AIC   BIC
## 1     0.451          0.45  20.8       381 8.3e-181  4  -6221 12452 12478
##   deviance df.residual
## 1   603467        1393

Continuous interaction

Predicted values plots

Predicted values plots

Interaction terms in GLMs

\[p(\text{Survival}) = \frac{e^{\beta_0 + \beta_{1}\text{Age} + \beta_{2}\text{Sex}}}{1 + e^{\beta_0 + \beta_{1}\text{Age} + \beta_{2}\text{Sex}}}\]

##          term estimate std.error statistic  p.value
## 1 (Intercept)  1.27727   0.23017      5.55 2.87e-08
## 2         Age -0.00543   0.00631     -0.86 3.90e-01
## 3        Male -2.46592   0.18538    -13.30 2.26e-40
##   null.deviance df.null logLik AIC BIC deviance df.residual
## 1           965     713   -375 756 770      750         711

Interaction terms in GLMs

\[p(\text{Survival}) = \frac{e^{\beta_0 + \beta_{1}\text{Age} + \beta_{2}\text{Sex} + \beta_3 (\text{Age} \times \text{Sex})}}{1 + e^{\beta_0 + \beta_{1}\text{Age} + \beta_{2}\text{Sex} + \beta_3 (\text{Age} \times \text{Sex})}}\]

##          term estimate std.error statistic p.value
## 1 (Intercept)   0.5938    0.3103      1.91 0.05569
## 2         Age   0.0197    0.0106      1.86 0.06240
## 3        Male  -1.3178    0.4084     -3.23 0.00125
## 4    Age:Male  -0.0411    0.0136     -3.03 0.00241
##   null.deviance df.null logLik AIC BIC deviance df.residual
## 1           965     713   -370 748 767      740         710

Nonparametric regression

\[ \begin{align} \text{Obama thermometer} = \beta_0 &+ f_1 (\text{Respondent conservatism}) \\ & + f_2 (\text{Obama conservatism}) + e \end{align} \]

Nonparametric regression

\[ \begin{align} \text{Obama thermometer} = \beta_0 &+ f_1 (\text{Respondent conservatism}) + f_2 (\text{Obama conservatism})\\ & + f_3 (\text{Respondent conservatism} \times \text{Obama conservatism}) + e \end{align} \]

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## ObamaTherm ~ s(RConserv, ObamaConserv, k = 5)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   69.628      0.555     125   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                           edf Ref.df   F p-value    
## s(RConserv,ObamaConserv) 3.98      4 290  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.453   Deviance explained = 45.5%
## GCV = 432.19  Scale est. = 430.65    n = 1397

Decision trees

Support vector machines